1,743 research outputs found
Deep Gaussian Processes
In this paper we introduce deep Gaussian process (GP) models. Deep GPs are a
deep belief network based on Gaussian process mappings. The data is modeled as
the output of a multivariate GP. The inputs to that Gaussian process are then
governed by another GP. A single layer model is equivalent to a standard GP or
the GP latent variable model (GP-LVM). We perform inference in the model by
approximate variational marginalization. This results in a strict lower bound
on the marginal likelihood of the model which we use for model selection
(number of layers and nodes per layer). Deep belief networks are typically
applied to relatively large data sets using stochastic gradient descent for
optimization. Our fully Bayesian treatment allows for the application of deep
models even when data is scarce. Model selection by our variational bound shows
that a five layer hierarchy is justified even when modelling a digit data set
containing only 150 examples.Comment: 9 pages, 8 figures. Appearing in Proceedings of the 16th
International Conference on Artificial Intelligence and Statistics (AISTATS)
201
Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis
Factor analysis aims to determine latent factors, or traits, which summarize
a given data set. Inter-battery factor analysis extends this notion to multiple
views of the data. In this paper we show how a nonlinear, nonparametric version
of these models can be recovered through the Gaussian process latent variable
model. This gives us a flexible formalism for multi-view learning where the
latent variables can be used both for exploratory purposes and for learning
representations that enable efficient inference for ambiguous estimation tasks.
Learning is performed in a Bayesian manner through the formulation of a
variational compression scheme which gives a rigorous lower bound on the log
likelihood. Our Bayesian framework provides strong regularization during
training, allowing the structure of the latent space to be determined
efficiently and automatically. We demonstrate this by producing the first (to
our knowledge) published results of learning from dozens of views, even when
data is scarce. We further show experimental results on several different types
of multi-view data sets and for different kinds of tasks, including exploratory
data analysis, generation, ambiguity modelling through latent priors and
classification.Comment: 49 pages including appendi
Batch Bayesian Optimization via Local Penalization
The popularity of Bayesian optimization methods for efficient exploration of
parameter spaces has lead to a series of papers applying Gaussian processes as
surrogates in the optimization of functions. However, most proposed approaches
only allow the exploration of the parameter space to occur sequentially. Often,
it is desirable to simultaneously propose batches of parameter values to
explore. This is particularly the case when large parallel processing
facilities are available. These facilities could be computational or physical
facets of the process being optimized. E.g. in biological experiments many
experimental set ups allow several samples to be simultaneously processed.
Batch methods, however, require modeling of the interaction between the
evaluations in the batch, which can be expensive in complex scenarios. We
investigate a simple heuristic based on an estimate of the Lipschitz constant
that captures the most important aspect of this interaction (i.e. local
repulsion) at negligible computational overhead. The resulting algorithm
compares well, in running time, with much more elaborate alternatives. The
approach assumes that the function of interest, , is a Lipschitz continuous
function. A wrap-loop around the acquisition function is used to collect
batches of points of certain size minimizing the non-parallelizable
computational effort. The speed-up of our method with respect to previous
approaches is significant in a set of computationally expensive experiments.Comment: 11 pages, 10 figure
- …